Grammar string: a novel ncRNA secondary structure representation

نویسندگان

  • Rujira Achawanantakun
  • Seyedeh Shohreh Takyar
  • Yanni Sun
چکیده

Multiple ncRNA alignment has important applications in homologous ncRNA consensus structure derivation, novel ncRNA identification, and known ncRNA classification. As many ncRNAs’ functions are determined by both their sequences and secondary structures, accurate ncRNA alignment algorithms must maximize both sequence and structural similarity simultaneously, incurring high computational cost. Faster secondary structure modeling and alignment methods using trees, graphs, probability matrices have thus been developed. Despite promising results from existing ncRNA alignment tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA’s sequence and secondary structure in the parameter space of a context-free grammar (CFG). Being a string defined on a special alphabet constructed from a CFG, it converts ncRNA alignment into sequence alignment with O(n2) complexity. We align hundreds of ncRNA families from BraliBase 2.1 using grammar strings and compare their consensus structure with Murlet using the structures extracted from Rfam as reference. Our experiments have shown that grammar string based multiple sequence alignment competes favorably in consensus structure quality with Murlet. Source codes and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determine ncRNA structure shape using context-free grammar and support vector machine

Non-coding RNA molecules perform their cellular roles through their primary sequences as well as secondary structures. A model of secondary structure is required to determine mechanism of actions of ncRNA sequences. Secondary structure prediction is mainly determining all the possible base pairs of a given ncRNA sequence. In this work, we first make use of a structure prediction tool, Mfold to ...

متن کامل

Non-coding RNA finding based on probabilistic secondary structure information

Non-coding RNAs (ncRNAs) are under intensive research focus since several years ago. Whereas many researches have done in this field since [1], not so much knowledge about ncRNAs are gained so far. One of the reasons for this is that there is not enough computational tools available for ncRNA analysis. ncRNA finding is one of the most important tools for the analysis. However, no viable solutio...

متن کامل

Visualizing RNA Secondary Structure Base Pair Binding Probabilities using Nested Concave Hulls

The challenge 1 of the BIOVIS 2015 design contest consists in designing an intuitive visual depiction of base pairs binding probabilities for secondary structure of ncRNA. Our representation depicts the potential nucleotide pairs binding using nested concave hulls over the computed MFE ncRNA secondary structure. Thus, it allows to identify regions with a high level of uncertainty in the MFE com...

متن کامل

A novel method for comparing topological models of protein structures enhanced with ligand information

UNLABELLED We introduce TOPS+ strings, a highly abstract string-based model of protein topology that permits efficient computation of structure comparison, and can optionally represent ligand information. In this model, we consider loops as secondary structure elements (SSEs) as well as helices and strands; in addition we represent ligands as first class objects. Interactions between SSEs and b...

متن کامل

String-Tree Correspondence Grammar: A Declarative Grammar Formalism For Defining The Correspondence Between Strings Of Terms And Tree Structures

The paper introduces a grammar formalism for defining the set of sentences in a language, a set of labeled trees (not the derivation trees of the grammar) for the representation of the interpretation of the sentences, and the (possibly non-projective) correspondence between subtrees of each tree and substrings of the related sentence. The grammar formalism is motivated by the linguistic approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010